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Abstract 

Nested error regression models are useful tools for analysis of grouped data, 
especially in the case of small area estimation. This paper suggests a nested 
error regression model using uncertain random effects in which the random effect 
in each area is expressed as a mixture of a normal distribution and a positive 
mass at 0. For estimation of the model parameters and prediction of the random 
effects, an objective Bayesian inference is proposed by setting non-informative 
prior distributions on the model parameters. Under mild sufficient conditions, it 
is shown that the posterior distribution is proper and the posterior variances are 
hnite, confirming the validity of posterior inference. To generate samples from 
the posterior distribution, we provide the Gibbs sampling method with familiar 
forms for all the full conditional distributions. This paper also addresses the 
problem of predicting finite population means, and a sampling-based method 
is suggested to tackle this issue. Finally, the proposed model is compared with 
the conventional nested error regression model through simulation and empirical 
studies. 

Keywords: Bayesian estimator, nested error regression model, posterior 
propriety, small area estimation, uncertain random effect 


1. Introduction 

Linear mixed models and model-based estimators including the empirical 
Bayes estimator (EB) or empirical best linear unbiased predictor (EBLUP) have 
been studied quite extensively in the literature from both theoretical and applied 
points of view. Of these, small area estimation is an important application, and 
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methods for small area estimation have received much attention in recent years 
due to a growing demand for reliable small area estimates. For good reviews 
on this topic, see Ghosh and Rao Q, Rao and Molina 131, Datta and Ghosh 

and Pfeffermann [l^ . The linear mixed models used for small area estima¬ 
tion are categorized into two major types, the Fay-Herriot model suggested by 
Fay and Herriot Q for area-level data, and the nested error regression (NER) 
models given in Battese, Harter and Fuller Q for unit-level data. The result¬ 
ing model-based estimators, such as EB or EBLUP, for small-cluster means 
or subject-specific values, provide reliable estimates with higher precision than 
direct estimates like sample means. These stable inferences are owing to ran¬ 
dom effects, but the misspecification of random effects may increase the risk of 
prediction. 

Concerning this issue, Datta, Hall and Mandal @ recently suggested infer¬ 
ence by testing the presence of random effects in general mixed models. They 
pointed out that if the random effects can be dispensed, the model parameters 
and the small area means mw be estimated with substantially higher accuracy. 
Eurther, Datta and Mandal [3 generalized the idea of preliminary testing to the 
uncertain random effects in the Eay-Herriot model, which is described as 


yi — Oi -\- £i^ Oi — [3 UiVij i — 1 ,.... 


where ^ J\f{0,Di) for known Di, Vi ^ A/"(0,H) and Vv[ui = 1) = p = 
1 — Pr(uj = 0). In Datta and Mandal [1], the term UiVi is called the “uncertain 
random effect” since the density of UiVi is expressed as a mixture of Af{0, A) and 
the one-point distribution on 0. The mixture expression of the distribution of 
random effects can control the extent of random effects and flexible prediction 
can be achieved. Actually, the resulting estimator (predictor) of 9i is expressed 
as the linear combination of the direct estimator yi and the regression estimator 
xjf3. The weight depends on the squared residuals {yt — xj(3)^ while the weight 
in the resulting estimator from the traditional Fay-Herriot model does not take 
the residuals into account. In Datta and Mandal Q, the Bayesian method 
was implemented for inferences of the small area parameters as well as the 
model parameters by setting the proper prior distributions for p and A, namely 
p ^ i?eta(ai, 02 ) and A ^ IG^a^, 04 ) for known (user specified) a^, i = 1,2, 3,4, 
and the improper uniform prior for /3, where Heta(ai, 02 ) and IG{a^, 04 ) denote 
the beta and inverse gamma distributions, respectively. It was shown that the 
resulting posterior distributions of all the parameters are proper under some 
conditions. However, Datta and Mandal @ focused on the Fay-Herriot model, 
and their method could be restrictive in real applications. Moreover, they used 
the proper (informative) prior distribution for both p and A, and the result 
could be affected by the choice of hyperparameters. 

In this paper, we treat not only the uncertain random effects in more general 
small area models like the NER model, but also non-informative prior distribu¬ 
tions for model parameters. The NER model has been used in various applica¬ 
tions including small area estimation, biological experiments and econometric 
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analysis. The NER model is described as 


y,j = xjj(3+ Vi + Eij, j = i = 


where is the sampling error associated with and Vi is a random effect in 
the ith area. It is usually assumed that Sij and Vi are mutually independent and 
distributed as ^ A/’(0, tr^) and Vi ^ A/’(0, r^), respectively. The main purpose 
of the NER model is to predict (estimate) the quantity of linear combinations of 
(3 and Vi, namely Hi = cjf3 + Vi for some known vector c^. For a decade, there 
has been criticism that the assumption of the NER model is not necessarily 
satisfied in real applications and several extensions of the NER model have 
been proposed in order to adapt to real data sets. For example, Jiang and 
Nguyen [1^, Kubokawa, Sugasawa, Ghosh and Chaudhuri 11| and Sugasawa 
and Kubokawa proposed heteroscedastic nested error regression models in 
which the variance components and are not constant over the areas. 
Also, Ghosh, Sinha and Kim Q, Arima, Datta and Liseo [H and Torabi [l6| 
introduced extended models with measurement errors in covariates. However, 
the problem of uncertainty of random effects, to our knowledge, has not been 
considered so far in the context. 

In this article, we suggest the use of the uncertain random effect in the 
NER model and propose the uncertain nested error regression (UNER) model 
by adopting the structure 


Vi\ui ^ J\f{0,UiT'^) with Pi{ui = 1) = p. 


For the prior distribution of r^, the variance of random effects, we use the prior 
distribution depending on m^’s, which is defined as 

7 r(r^|z > a) oc , 7r(T^|z < a) oc 7r*(T^), 

for some a > 0, where z = is some proper density, so 

that the prior distribution of is more non-informative than the proper prior 
such as an inverse gamma distribution as used in Datta and Mandal For 
the other parameters (3, and p, we also assign the non-informative prior as 
cr^,p) oc _ p)-i/2 a Hence, our Bayesian procedure is objective. 

We also apply the NER model in the framework of the finite population to 
predict the true finite population mean based on the partially observed data in 
each population. 

This article is organized as follows. In Section[5J we describe the details of the 
UNER model and provide the Bayesian estimation method as well as the main 
theorem regarding the propriety of the posterior distribution and the finiteness 
of posterior variances. The prediction problem of finite population means using 
UNER is also discussed. In Section [31 we compare the UNER model with the 
NER model through simulation and empirical studies. Concluding remarks are 
given in Section [4] and the technical proof is given in the Appendix. 
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2. Uncertain Nested Error Regression Models 

2.1. Model settings and Bayes estimator 

We consider the following uncertain nested error regression (UNER) model 


Vij — ^ijf^ + Ui + £ij, j — 1 ,..., rij, (2 1 ) 

= 1 ) ~ = 0 ) ~ (5o(ui), 

independently for i with Pr(Mi = 1) = 1 — Pr(Mi = 0 ) = p, where Xij is a 
g-dimensional vector of covariates, /3 is a g-dimensional vector of regression 
coefficients, (5o(’) denotes the Dirac measure on 0, and £y’s are independently 
and identically distributed as J\f{0,a^). The marginal density function of Vi is 
given by 

which is a mixture of the normal distribution A/'(0 ,t^) and the point mass 
on 0. Thus the model parameters are regression coefficients (3, the variance 
components and t^, and the mixture ratio p. Let y^ = {yn,... ,ymi)^ be 
the observed vector in the zth area. Then the variance of is Var(yj) = a^Im + 
pr^Jm for Jm = InilXi- If Ih® prior probability p of Ui = 1 is 0 , it follows that 
Var(yj) = and the observations in the zth area are mutually independent. 

The parameter which we want to estimate (predict) is pi = cjf3 + Vi for a known 
vector Ci. The typical choice of Ci is Xi = n~^ Si=i which pi corresponds 
to the mean of the zth area. 

The posterior distribution of pi given Ui and is 


l^i\ut,y. 



niT‘^I{ui = 1 ) 
(T^ + 


iVr 


X, /3), 


I{ui = l)g^r^ N 
cr^ + ) ’ 


where iji = yij, the sample mean of yij in the ith area. Thus the 

posterior distribution of pi given y^ is a mixture of the normal distribution and 
one point mass on cJ(3. The resulting Bayes estimator pi of pi is 


Pi = E[p,|yJ =pJcJf3+ —^(pi - xj(3)\ + {^-p^)cJf3 

Ta , niT'^p^ _-r 

= c, (3 + —2 (y* “ 


where pi is the posterior probability of = 1 given by 


Pi = Pr(ui = l|y,) 



p+ (1 -p) 




mr^ 


exp 


( 




2 a 2 ( 


+ TT-iT^ 


-{yi - xj 


( 2 . 2 ) 


We note that pi increases in p and (p^ — xj(3)'^. Thus, if Xij is a good covariate 
to explain yij in the zth area, the squared residual {yi — xj / 3)2 is expected to be 


4 



small, and the posterior probability pi is small as well. The posterior probability 
Pi is 1 when p = 1 and pi converges to 1 as {iji — xj go^s to infinity. 
Moreover, the posterior variance of pi is expressed as 


y^{y^) = Var(/r,|y,) = Var{i;i|y,) 


(tr^ -p mr'^Y 


{Vi - Xi (3) pi{l-pi) + 


2 2 ^ 
<y T^Pi 


(2.3) 


It is interesting to point out that the posterior variance of pi, in this case, 
depends on observation through the squared residual {pi — xj(3)“^ and the 
posterior probability p), while the posterior variance of the random effect in the 
usual nested error regression model is given by cr^T^((T^ + niT^)~^, which does 
not depend on observation This means that the uncertain random effect 
enables us to take the distance between sample mean pi and synthetic estimator 
xj fi into the posterior variability of the interesting parameter pi. 


2.2. Bayesian implementation and posterior distribution 

Since the marginal likelihood function of the model parameters 
and p is rather complex, we consider objective Bayesian inference for the model 
parameters as well as the random effect Vi. To this end, we rewrite the model 
(| 2 . 1 II as 

yy|ui,/3,(T^ - A/'(a;J/3 + Uj,(T^), j = l,...,ni, i = l,...,m 
A/'(0,Uir^), Ui\p Ber{p), i = 1,...,to 


independently for i, where Ber{p) denotes the Bernoulli distribution. For im¬ 
plementation of full Bayesian inference, we need to set prior distributions on 
the model parameters. To keep objectivity of inferences, we use the uniform 
prior distribution on {3 and the Jeffreys prior distributions on cr^ and p. On the 
other hand, the prior distribution of should depend on z = since 

cannot be identified for a small value of z. Thus, for the model parameters, we 
use the prior distributions 


7 r(/ 3 , (T^,p) = p p) 7 r(T^|z) oc 


r ^ (z > o) 
7 r,(r^) (z < a) 


(2.5) 


where 7 r*(r^) = exp(— 62 /T^) for known constants 5i > 3 and 62 > 0. 

The value of a is chosen by the user, and this point will be discussed later. It 
is noted that the prior distribution on p is proper, but the priors on (3, cr^ and 
are improper, so that the posterior propriety is not always guaranteed. In 
Theorem l2.Il we show that the posterior distribution for the model parameters 
is proper under mild conditions. 

We now describe the posterior distribution and investigate its properties. 
The set of all observed data is denoted hy D = {y^, for Xi = 
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{xii,..., Xim). From the model (12.41) with prior setup (12.5L the posterior den¬ 
sity of parameters {v, u, /3, a'^,T'^,p) for v = (ui,..., Vm)^ and u = {ui,..., Um)^ 
is given by 


n{v,u,(3,a^,T^,p\D) 

^^^2-j-(N+l)/2^^2^-{z+I{z>a)}/2-{bi+l)I{z<a)pZ-l/2^^ _ ^•jm-z-1/2 


m 

n[exp( 




i=\ 


2cr2 


UiVi \ 1 


( 2 . 6 ) 


X exp I - ^/(z < a) I. 


Now, we state our main result about the posterior propriety and the existence 
of posterior variances . 


Theorem 2.1. (a) The marginal posterior density TT{(3,a^, t^, p\D) is proper 
if N > q + 2 and m > a > 1 

(b) The model parameters /3,(T^,r^ and p have finite posterior variances if 
N > q + 6 and m > a > 5. 


Remember that q is the dimension of the vector of regression coefficients /3, 
and a is the tuning parameter of the prior for . Part (a) in Theorem 12.11 says 
that the marginal posterior densities of the small area means are proper and part 
(b) provides a sufficient condition for obtaining finite measures of uncertainty 
for the model parameters. We note that the conditions in Theorem 12.11 are 
similar to the conditions given in Arima, et al. [l| and Datta and Mandal 
The proof of Theorem 12.11 is presented in the Appendix. 

Since the posterior distribution in (12.61) cannot be obtained in a closed form, 
we rely on the Markov chain Monte Carlo technique, in particular the Gibbs 
sampler, in order to draw samples from the posterior distribution. This re¬ 
quires generating samples from the full conditional distributions for each of 
(u, u, (3, ,t‘^,p) given the remaining parameters and the data D. Fortunately, 

the full conditional distributions are described as familiar distributions allowing 
us to easily implement the Gibbs sampling. The full conditional distributions 
are given by 


Vi\ui,f3,a^,T^,D N 


Uir'^Kui = 1 ) = 1 ) 

2 yVi 2 


-I- niT‘‘ 


+ UiT^ 


, i = l,...,m, 
1 


Li\f3,a‘^,T‘^,p,D ^ Ber{pi), i = l,...,m, p\u, D ^ Beta(^z + - z +-'j , 


(3\u, 




1 ^ t 

r‘^\u,v, D ^ Ig(^-{z - I{z > a)) + bil{z < a), + b 2 l{z < a)^, 

a^\v,(3, D ^ Ig{^{N - 1), -X^- Zv)^{y - X/3 - Zv)) , 

(2.7) 
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where z = Yh=i = diag(Si„,..., Sm„) with S„ = cr^/„, + 

y = (y^^, .. .,yZiV, X = (Xi,.. .,Xm), and pi is given in Using these 

expressions of full conditional distributions, we can easily draw posterior samples 
of all the variances and parameters to make inferences, such as point estimation, 
prediction intervals and standard errors, for pi = c[[3 + Vi. 

In the closing of this section, we discuss the choices of a, bi and 62 in the 
posterior distribution of We remember that the prior distribution of is 
non-informative and improper when z > a and informative and proper when 
z < a. Taking it into account, we should select a value of a as small as possible. 
Hence, it follows from Theorem [Q that a = 5 is the most reasonable choice. 
On the other hand, as discussed in Datta and Mandal Q, a reasonable choice 
is 61 = U + 2 and 62 = U(U + 1) such that E[r^|z < a] = V and Var(T^|z < 
a) = U^, where V is the estimated sampling variance given by 

^ m rii 

^ = N-m-q ^ ~ 

^ i=l j=l 

Here, /Sqls i® ordinary least squared estimator of (3. It should be noted 
that V satisfies E[U] = cr^. 

2.3. Prediction in finite populations 

Here, we consider the problem of predicting the means in finite populations. 
Assume that there exist m finite populations and the iih. population consists 
of Ni pairs of data {Yij,Xij), j = I,...,W- H is supposed that ni{< Ni) 
observations are sampled from the ith population. What we want to predict 
is the mean of the ith finite population % = N~^ • Assume also that 

the mean vector of covariates Xi = N~^ i® available, which is often 

encountered in real application (Battese, et al. [2|). Let Si and Vi be collections 
of indices of sampled and non-sampled observations in the ith area, respectively, 
so that Si and satisfy fl = (/) and U r* = {I,... W}- Without loss of 
generality, we assume that Si = {l,...,ni} and = {ni + 1,..., Ni}. The 
Bayes estimator of Yi under quadratic loss is given by 

E[hi|yi] = + {Xi - nOE[U,(r)|yj|, 

where 

yi{s) — IT'i 'y ^ Uij t = {Ni — Hi) y { Xij. 

j&Si jGri 

For evaluating the conditional mean E[Uj(j.) lyj, we assume that Yij is expressed 
as 

Yij — Xj^jf3 Y Vi Y £ij , j ^ £ it 

that is, the non-sampled observations have the same data generating structure 
as the sampled ones. Then the unobserved mean Yi^^) is expressed as 

Yi{r) = xJ(^r)/3 YViY £i(r)t 
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where en^j.) = — ni) ^ • Thus the conditional distribution of 

given j/j and Ui is 


r-Cr,~ A'(x„„/3 + — -X, 3). ^ 

which yields the predictive density of given by 


Ni — n. 
( 2 . 8 ) 




- I 

a'^+riiT'^^^^ ^2 _|_ ^..^2 ' jy. _ ^. 


+ (1 -P*)^(®i(r)/3, 


Ni-m/’ 


where pi is the posterior probability of Ui = 1 given in (12.21) . Thus the condi¬ 
tional distribution of the non-sampled data is a mixture of the two normal dis¬ 
tributions of the predictive density, with and without random effect. Moreover, 
the conditional variance Yi(^r) given is calculated as Vi{yj) + {Ni — ni)~^a^, 
where Vi{yi) is the posterior variance of vi given in (12.31) . It is noted that, when 
the true mean vector of the explanatory variables Xi is available in each area, 
the value of is easily obtained by 


^i{r) — {Ni Tli'j {NiN-i TliXi{. 

To implement the prediction in the finite population model, we regard Ti(r.) as 
latent variables and add the sampling step from (12.81) to the Gibbs sampling 
given in (I2J1) . 


3. Numerical Studies 
3.1. Model based simulations 

In this simulation study, we compared the UNER model with the conven¬ 
tional NER model in terms of the quality of the estimates. In applying the NER 
model, we used the Jeffreys prior on (/3 ,t^,(T^), namely 7r(/3,r^,cr^) = 
where it is well-known that the resulting posterior distribution is proper (Berger 
i). The full conditional posterior distributions are given by 


^ 2 2 

u,|/3,cr^r^T) ^ — ^{yi-xj(3), / ^ i=l,...,m 

\a^+niT^ a^ + UiT^/ 

/3|rV2, D - Np{{X^-E~'^X)-^X^i:-^y, (X^S”^X)-i), 


1 -j m 

t2|u,D - iG(^-{m-l),-^vf'^, 

a^lu, (3,D^ ^G(i(lV - 1), l(y -Xf3- Zv^{y -X(3- Zu)), 


(3.1) 


where S = diag(Si,..., S^) with J- r^l„. 1^.. We considered the 

following data generating process: 


Vij = Po + PiXij + Vi + Eij , j = l,...,n, i = l,...,m, 
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where £ij ^ J\f(0, 1), /3o = 1, /3i = 0.5, and Xij’s were generated from the uniform 
distribution on (1, 2) and fixed through simulation runs. The four combinations 
of {n,m) were considered as {n,m) = (5,20), (5,40), (10,20), (10,40). For 
the true distributions of Vi, we considered the following four scenarios for each 
choice of (n, m). 

SI: u,-AA(0, (0.7)2), g2: u* - 0.3^o(^^^) + 0.7Ar(0, (0.7)^), 

S3: u*-0.35o(?^*)+0.7/:(0, (0.7)2), _ o.3(5o(uO + 0.7<6(0, (0.7)2), 


where t(,{a,h) and £{a,b) denote the scaled t-distribution with 6 degrees of 
freedom with mean a and variance b and the scaled Laplace distribution with 
mean a and variance b, respectively. Hence, UNER is misspecified in scenarios 
S3 and S4, and overspecified in scenario SI. 

Based on i? = 1,000 simulation runs, we computed the mean squared er¬ 
rors (MSE), absolute bias of 'fli, and empirical coverage probability of the 95% 
credible interval of jit, which are respectively defined as 


R m 


R m 


MSE = 


Bias=^EEi^r^-^r^' 


mR 


r—1i—1 
R m 


mR 


r—1i—1 


CP = ;;l5EE^<d"’eci«)xioo, 


mR 


r—1i—1 


where and Cl-’’^ are the posterior mean, the true value, and the 95% 

credible intervals, respectively, of jii in the rth simulation runs. In each itera¬ 
tion of the simulation run, we used 5,000 posterior samples after 1,000 initial 
iterations for both UNER and NER. The results are given in Table [T] In sce¬ 
nario SI, both the MSE and absolute bias of UNER are larger than those of 
NER since UNER is overspecified. However, as the number of n and m get 
large, the difference of these values gets small. For the other scenarios, we can 
observe that UNER clearly performs better than NER in terms of MSE and 
absolute bias, and the differences get larger as n and m get larger. Finally, it is 
observed that the coverage probability of credible intervals are similar in UNER 
and NER. Hence, we can conclude that UNER is expected to be a useful tool 
when m and n are moderate or large. 


3.2. Application to PLP data in Japan 

This example, primarily for illustration, used the UNER model m and 
data from the posted land price data along the Keikyu train line in 2001. This 
train line connects the suburbs in Kanagawa prefecture to the Tokyo metropoli¬ 
tan area. Those who live in the suburbs in Kanagawa prefecture take this line to 
work in Tokyo every weekday. Thus, it is expected that the land price depends 
on the distance from Tokyo. The posted land price data are available for 52 
stations on the Keikyu train line, and we consider each station as a small area, 
namely, m = 52. For the fth station, data of Ui land spots are available, where 
Ui varies around 4 and ranges from 1 to 11. 
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Table 1: Simulated MSE, Bias and Coverage Probabilities (CP) of UNER and NER in Different 
Scenarios. 


(n, m) 

Scenario 

MSE 

UNER 

Bias 

CP 

MSE 

NER 

Bias 

CP 

(3,25) 

SI 

0.278 

0.419 

92.3 

0.265 

0.408 

92.3 


S2 

0.165 

0.308 

93.6 

0.176 

0.320 

93.4 


S3 

0.156 

0.293 

93.3 

0.166 

0.309 

93.2 


S4 

0.163 

0.301 

93.9 

0.172 

0.313 

93.8 

(3,50) 

SI 

0.248 

0.396 

93.2 

0.242 

0.388 

93.2 


S2 

0.126 

0.252 

94.3 

0.136 

0.267 

94.3 


S3 

0.128 

0.245 

93.6 

0.140 

0.261 

93.6 


S4 

0.130 

0.258 

94.6 

0.140 

0.272 

94.3 

(6,25) 

SI 

0.160 

0.319 

93.7 

0.154 

0.313 

93.7 


S2 

0.088 

0.215 

94.1 

0.098 

0.235 

94.1 


S3 

0.088 

0.217 

93.7 

0.103 

0.239 

93.7 


S4 

0.094 

0.221 

93.8 

0.104 

0.240 

93.8 

(6,50) 

SI 

0.144 

0.302 

94.3 

0.141 

0.299 

94.3 


S2 

0.076 

0.206 

94.5 

0.095 

0.229 

94.5 


S3 

0.071 

0.180 

94.3 

0.091 

0.216 

94.3 


S4 

0.077 

0.191 

95.1 

0.088 

0.216 

95.1 


For j = let yij denote the log-transformed value of the posted 

land price (Yen) per for square meter of the jth spot, Ti is the time it takes 
from the nearby station i to Tokyo station around 8:30 in the morning, Dij 
is the geographical distance from spot j to station i and FARij denotes the 
floor-area ratio, or ratio of the building volume to the area at spot j. These 
values of Ti,Dij and FARij are also transformed by the logarithmic function. 
We applied the UNER model described as 


Vij — /3o + FARij(3i -|- Tij32 + DijjS^ + Vi + Sij, 

Vi\{ui = 1) --A/'(0, r^), Vi\{u^ = 0) ^ So{vi), 


(3.2) 


where Eij’s are independent and identically distributed as Af{0,a^). For com¬ 
parison, we also applied the conventional NER model to this data set. 

In applying the UNER model, we used the prior distribution with a = 5 and 
bi =V+2 ,62 = U(U-l-l) for V = 0.031 as discussed in the end of Sectionjmi In 
both models, we generated 100 , 000 posterior samples after 10 , 000 iterations of 
Gibbs sampling given in (lO) and m, respectively, and obtained the posterior 
means as well as the 95% credible intervals of the model parameters, which are 
given in Table [H Moreover, based on the posterior samples, we computed the 
Deviance Information Criterion (DIG) suggested in Spiegelhalter, Best, Carlin 
and van der Linde 1^, which is defined as DIG = 2D{4‘) — D{cf)), where 4> 
is a vector of the unknown model parameters, D(</>) is (— 2 ) times the log- 
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Table 2: Posterior Means and Credible Intervals of the Model Parameters, and DIG. 




Po 

Pi 

P2 

P3 

2 

a 

2 

T 

P 

DIG 


95%CI (upper) 

15.16 

0.24 

-0.53 

-0.051 

0.041 

0.071 

0.99 


UNER 

mean 

14.55 

0.17 

-0.61 

-0.091 

0.033 

0.017 

0.54 

512.6 


95%CI (lower) 

13.88 

0.11 

-0.69 

-0.131 

0.026 

0.002 

0.05 



95%CI (upper) 

15.17 

0.24 

-0.53 

-0.050 

0.20 

0.117 

- 


NER 

mean 

14.52 

0.17 

-0.61 

-0.089 

0.18 

0.075 

- 

703.1 


95%CI (lower) 

13.88 

0.10 

-0.69 

-0.132 

0.16 

0.031 

- 



marginal likelihood function, and and (f) denote the posterior means of 

D{(f)) and 4>, respectively. Note that 4> — {/3, cr^,p} in the UNER model, 
and 4> = {/3, in the NER model, which are given in Table [5] as well. 

It is revealed from Table [5] that the posterior estimates and credible intervals 
of regression coefficients Pi,... ,134 are similar between UNER and NER, and 
in both models, all the credible intervals of regression coefficients are bounded 
away from 0. On the other hand, the results of variance components and 
are different because of the effect of the parameter p. In terms of DIG values, 
the UNER model seems more preferable than the conventional NER model. To 
see the effects of Ui, we calculated the posterior probabilities pPs which are 
illustrated in the left panel in Figure [TJ It is revealed that the pPs change 
dramatically from area to area, and the pPs in most areas are around 0.5 which 
comes from the posterior mean oi p = 0.54 as shown in Table [2l 

We next considered estimating the land price of a spot with a floor-area ratio 
of 100% and a distance of lOOOm from the station i, namely 

Mi = /3o + F ARqPi -f TiP2 + DqP^ + Vi, 

for FARo = log(lOO) and Dq = log(lOOO). Based on the posterior samples, we 
calculated the point estimates pi and the posterior standard errors. The results 
are given in the right panel of Figure [TJ noting that the mean of the posterior 
standard errors for all areas in UNER and NER are 6.5 x 10“^ and 6.8 x 10“^, 
respectively. We also computed the length of the prediction intervals of pi, and 
found that the results are similar to standard errors. It is revealed from Figure [T] 
that UNER provides better estimates than NER in terms of posterior standard 
errors in most areas. In some areas, the posterior standard errors of UNER are 
larger than those of NER when correspondingly the posterior probability pi is 
larger than 0.7 as shown in the left panel of FigureJTj Thus the uncertain random 
effects may increase the variability of predictors compared to the conventional 
random effects in the areas where the existence of random effect is strongly 
supported. This phenomenon was pointed out in Datta and Mandal @ in the 
Fay-Herriot model. However, taking the DIG values into account as well, the 
UNER model works well in this application. 
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Figure 1: Posterior Probability of Uj = 1 (Left) and Standard Errors of fii (Right) in Each 
Area. 


3.3. Design based simulation 

We next investigated the numerical performance of the small area prediction 
problem in the framework of a finite population. We again used the PLP data in 
the Kanto region in 2001, which includes Tokyo, Kanagawa, Chiba and Saitama 
prefectures. Thus the data set includes the PLP data along the Keikyu line 
used in the previous subsection. The full data set we used is the land price data 
with covariates (Tj, Dij and FARij as used in the previous study) and each 
data point has its unique nearest railroad station, which we regard as a small 
area. For the zth small area (z = 1,..., m), there are Ni land spots. To consider 
all the observed land price data in each small area in the framework of a finite 
population, we analyzed only the data which belong to the small areas that have 
a moderately large number of data points, namely we pick up the area z’s with 
Ni > 20. Then the resulting number of finite populations is m = 30, and the 
population sizes W’s range from 20 to 45, but most fVi’s vary around 25. We 
artificially made the sampled data set and predict each finite population mean 
of the land price by applying UNER. The sampling scheme is simple random 
sampling without replacement in each finite population and rz^ data are sampled 
in the zth finite population. The sample sizes rzi’s are decided by some ratio 
0 < TT < 1 and IOOtt percent of the data in each population are sampled, that 
is rzi is the nearest integer to Ni x tt. We considered four choices for tt, namely 
TT = 0.3,0.5,0.7,0.9. In each case, we computed the squared root mean squared 
errors for estimators of finite population means as 


SMSEi = 



R 


JIN' 


1‘if 
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■^(r) 

where /i^ is the estimator of the finite population using UNER or NER, and 
R = 1 ,000 in this study. For both UNER and NER, we calculated by 5,000 
posterior samples after 1,000 iterations using the method discussed in Section 
12.31 In the UNER estimation, the same form of the prior distribution as in 
the previous section was used, namely a = 5,bi = U + 2 and 62 = U(U + 1 ) 
for estimated sampling error V. To compare values of the SMSE for the two 
models, we then computed the ratio of SMSE given by SMSE™^^/SMSE^^^, 
and provide their values in Figure [2] It is observed from Figure [2] that UNER 
provides better estimates than NER in some areas, but worse estimates than 
in several areas for the four cases of tt. Moreover, it is also revealed that an 
improvement of UNER over NER becomes greater as the sampling rate tt gets 
larger. 

4. Concluding Remarks 

In this article, we have proposed the use of uncertain random effects in the 
nested error regression model called the UNER model for unit-level data. This 
can be regarded as an extension of Datta and Mandal We have used the 
non-informative priors for all the parameters and proposed Bayesian inferences 
for the linear combination of fixed effects and random effects as well as the 
model parameters. We have shown that the posterior distribution is proper and 
the posterior variances exist under some conditions. Through the simulation 
study, we have compared the UNER model with the conventional nested error 
regression (NER) model. It has been revealed that UNER can provide more ac¬ 
curate estimates than that of NER when the underlying distribution of random 
effects is a mixture of a point mass on the origin and a continuous distribution. 
Moreover, we have applied UNER together with NER to the PLP data and have 
found that the UNER model fits better than the NER model in terms of DIG. 
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Appendix A. Proof of Theorem 12.11 

Let TT* be the right side of (12.611 . For part (a), we shall show that 

/ TT*{v,u,f3,cr‘^,T^,p\D)dvdf3da^dT‘^dp < (X, 
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Figure 2: Squared Root Mean Squared Errors of Estimation of Finite Population Mean. 


namely the integral for each u is finite. We first prove for the case u = 
(0,..., 0)^. In this case, the integral reduces to 

/ -. m rii 

(cr2)-(^+i)/2(i _p)m-i/2gxp| _ _ xjjf3f^df3da'^ddp. 

It is noted that f = B(l/2,m + 1/2), where B{a,b) is a 

beta function. Then the integral is finite since the posterior distribution of the 
usual linear regression for the Jeffreys prior is proper if the conditions given in 
Theorem o are satisfied. 
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For the integral in the case z > 1, using — p)^ ^ < 1, it is 

sufficient to show that 


J nu{v,a‘^,T‘^,f3)dvdf3da‘^dT'^ < oo, 


7r„(u,CT^r^^) = 


7r„i(t!,(T2,T^,/3) {z>a) 

TTu2iv,(T‘^,T‘^,f3) {0<z<a) 


where 


2 ^2 ON _ / 2N-(Ar+l)/2/ 2 n-(z+1)/2 


7r„i(t!,CT ,T ,f3) = (a ) 




m 

n[exp( 


T,%liy^3 - u.vf 


v^y 

2 t 2 )\ ' 


7r„2('W,cr^,T^,/3) = (ct^) (^+i)/2(r2) J^(5o(u*)^' 


rn 

n[exp( 


E7=l iVij U^vf ' 


To show the integrability of 7r„i and 7r„2, we consider the case of u with 
Without loss of generality, we assume that Ui = \ for i = \,... ,k 
and Mi = 0 for i = A: + 1,..., m. Then 7r„i(M, cr^, /3) can be rewritten as 


7r„i(M,cr^,r^,/3) 


= (cr2)-(^+i)/2r^2N,-(fc+l)/2 


Ar / YTjUiyij - xjjfd - Vif vf ' 

n[“p(-^-2P- 2;^, 


m 

n e^p( 


2 a^ 


We define Wdimensional vector s(m*) = (s(i) as = ((y^ — 

Vil„J^, • ■ •, (yfc-Mfcl„J^)^ ands( 2 ) = ... ,y^y ioi = (vi,..., 

Then, if > g, we have 


jTTui{v,a‘^,T‘^,f3)df3 


OC (^2)-(Ar-,-l)/2-l(^2)-(fe-l)/2-l , 


s{v^yAs{v^) 
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where A = — X{X^X) ^X^. The right side is integrable with respect to 

(T^ and since N > q + 1 and k > a > 1, whereby we obtain 


/ 


7r„i(t),cr^ 


m 

T^,/3)d/3d(j^dT^ oc 7r„i(ti,) 5o{vi), 


where 




-(N-q-l)/2 . 




-(fe-l)/2 


In what follows, we show that TTuiiv^) is integrable. To this end, we note that 



7rui(v^,)dv 


.<1 




.>1 


and we evaluate the two integrals separately. For the first term, since A is 
idempotent and rank(A) = N — q {> 0), there exists c(y) > 0 such that 
s(u*)'''Aa(t) 4 ,) > c{y) for all u*. Then we have 


-(fe-l)/2 


[ 7r„i(u,)(iu < c [ (vJvA 

JvJv^Kl ^ ' 

-l)/ 2 y(^fc) y'(^ 2 )-(fc-l)/ 2 (^ 2 )(fc-l)/ 2 ^^ ^ 




where F(5'^) is the volume of the unit sphere in R^. For the second term, it 
follows that 

f TTul{Vt:)dv = f |s(u*)^As(u*)| {vj Vt:)~^'^~^'>^^dv. 


Since s(u*)^As(u*) is a quadratic function of u*, the integral is finite as far as 
N > q + 2. For the integrability of 7r„2, we carry out integration with respect 
to /3, cr^ and to get 


/ 


TTu2{v,a‘^,T‘^ 


m 

/3)d^d(T^(iT^ oc 7r„2(i’*) (JoK)- 


Since for q + 1, 

r 'I —{N—q—l)l2 / \—k/2 — bi 

t ^ u 2 { v *) =|s(u*)^Aa(u*)| + 262] 

<C-(W-9-l)/2(252)-fe/2-bi^ 

it follows that 7r„2(’W*) is integrable as far as > g + 1. Thus the proof of part 
(a) is established. 

For the proof of part (b), it is sufficient to show that the posterior second 
moments are finite. Since the statement for p is clear, we establish the result 


16 



for j3, and r^. As in the proof of part (a), we consider the three cases where 
z > a, 0 < z < a and z = 0. By replacing + 1 in expressions of 7r„i, 7ru2 and 
7r„3 with + 5, it follows that E[(cr^)^|£)] < oo when A^ > g + 6. 

For E[/3/3^|Z)], we first note that 

(a(-*;*)-X/3)T(a(-u,)-A:^) 


[ /3/3^exp(- 
=(a2)«/2|X^X|-i/2exp(- 


2a2 


djS 


= (cr^)^'^+^)/^A(X, s(v*), cr^) 

+ {a‘^Y/‘^h{X,s{v^),a'^)X^ s{v^)s{v^)^ X{X^ X)-\ 

hiX, s{v,),a^) = exp ( - {X^Xy 

Then it follows that 

J /3fi^TTui{v, f},a^,T^)dvdf3da'^dT^ 


for 


-{N-q-3}/2 


Iq / 

JR'= ^ 

+ / x^s{vy)s{vy^XTTui{vy)dv. 




-(fe-l)/2 


dv 


Since < {vjv^Ig, the second term is finite if fc > 5 for all k > a, namely 
a > 5. The first term is also finite as far as A^ > g + 4. For the other cases 0 < 
z < a and z = 0, we can similarly show that / /3/3^7ru2(i’, /3, , T^)dvdf3da‘^dr'^ 

and J /3/3^7ru3(v,/3,a^,T‘^)dvdidda^dT'^ are finite under the condition given in 
Theorem [Q 

Finally, for E[t^)^|£)], it follows that 


Jynui {v, f3,a'^,T^)dvd(3da^dr^ 

^ j (cr2^-(Ar-p-l)/2-l|-^2^-(/c-5)/2-l 

- ^2 - —)dvdrd.r 

y- , y -((V-p-l)/2 . ,-(fc-5)/2 


which is finite as far as fc > 5 for all fc > a, namely a > 5. In the cases of 
0 < z < o and z = 0, it is integrable if 

(r^)^7r*(r^)dT^ < oo, 
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which can be established since bi > 3. Thus we complete the proof of part (b). 
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